Summarizing and understanding large graphs

نویسندگان

  • Danai Koutra
  • U. Kang
  • Jilles Vreeken
  • Christos Faloutsos
چکیده

How can we succinctly describe a million-node graph with a few simple sentences? Given a large graph, how can we find its most ‘important’ structures, so that we can summarize it and easily visualize it? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? Starting with the observation that real graphs often consist of stars, bipartite cores, cliques and chains, our main idea is to find the most succinct description of a graph in these ‘vocabulary’ terms. To this end, we first mine candidate subgraphs using one or more graph partitioning algorithms. Next, we identify the optimal summarization using the Minimum Description Length (MDL) principle, picking only those subgraphs from the candidates that together yield the best lossless compression of the graph—or, equivalently, that most succinctly describe its adjacency matrix. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to identify the vocabulary type of a given subgraph for six structure types prevalent in real-world graphs, (b) algorithm: we develop VOG, an efficient method to approximate the MDL-optimal summary of a given graph in terms of local graph structures, and (c) applicability: we report an extensive empirical evaluation on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Summarizing Static and Dynamic Big Graphs

Large-scale, highly-interconnected networks pervade our society and the natural world around us, including the World Wide Web, social networks, knowledge graphs, genome and scientific databases, medical and government records. The massive scale of graph data often surpasses the available computation and storage resources. Besides, users get overwhelmed by the daunting task of understanding and ...

متن کامل

Discovery of Rare Sequential Topic Patterns in Document Stream

When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events Ning Yang*, Sichuan University; Xiangnan Kong, University of Illinois at Chicago; Fengjiao Wang, University of Illinois at Chicago; Philip Yu, University of Active Multitask Learning Using Both Latent and Supervised Shared Topics Ayan Acharya*, University of Texas at Austin; Raymond Mooney, University of Texas at...

متن کامل

Summarizing Answer Graphs Induced by Keyword Queries

Keyword search has been popularly used to query graph data. Due to the lack of structure support, a keyword query might generate an excessive number of matches, referred to as “answer graphs”, that could include different relationships among keywords. An ignored yet important task is to group and summarize answer graphs that share similar structures and contents for better query interpretation ...

متن کامل

On Summarizing Large-Scale Dynamic Graphs

How can we describe a large, dynamic graph over time? Is it random? If not, what are the most apparent deviations from randomness – a dense block of actors that persists over time, or perhaps a star with many satellite nodes that appears with some fixed periodicity? In practice, these deviations indicate patterns – for example, research collaborations forming and fading away over the years. Whi...

متن کامل

VOG: Summarizing and Understanding Large Graphs

How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the ‘importance’ of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a ‘vocabulary’ of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct desc...

متن کامل

Generating examples of paths summarizing RDF datasets

As datasets become too large to be comprehended directly, a need for data summarization arises. A data summary can present typical patterns commonly found in a dataset, from which high-level understanding of the data can be obtained. Nonetheless, such abstract understanding can be improved by providing concrete examples of the summary patterns. If possible, the chosen examples should be diverse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015